Clustering Distributed Homogeneous Datasets
نویسندگان
چکیده
In this paper we present an elegant and effective algorithm for measuring the similarity between homogeneous datasets to enable clustering. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The algorithm presented is efficient in storage and scale, has the ability to adjust to time constraints, and can provide the user with likely causes of similarity or dis-similarity. The proposed similarity measure is evaluated and validated on real datasets from the Census Bureau, Reuters, and synthetic datasets from IBM.
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملClustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers
In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...
متن کاملExploiting Dataset Similarity for Distributed Mining
The notion of similarity is an important one in data mining. It can be used to provide useful structural information on data as well as enable clustering. In this paper we present an elegant method for measuring the similarity between homogeneous datasets. The algorithm presented is eÆcient in storage and scale, has the ability to adjust to time constraints. and can provide the user with likely...
متن کاملA Comparative Study of Issues in Big Data Clustering Algorithm with Constraint Based Genetic Algorithm for Associative Clustering
Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups, called clusters. The growing need for distributed clustering algorithms is attributed to the huge size of databases that is common nowadays. The task of extracting knowledge from large databases, in the form of clustering rules, has attracted considerable attention. Distri...
متن کاملHierarchical Intuitionistic Fuzzy Possibilistic C Means Kernel Clustering Algorithm for Distributed Networks
Advances in distributed networking have resulted in an explosion in size of modern datasets while storage and processing power continue to lag behind. This requires the need for algorithms that are efficient in terms of number of measurements and running time. To combat challenges associated with large datasets in distributed networks we propose hierarchical intuitionistic fuzzy possibilistic c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000